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AfigreRatlon Gain Reconsidered 

!• Introduction 

A large social science methodology llteratur'^ lias made clear the hazards to 
correct Inference occasioned by ualng grou^e^ '>ata. Researchers In soclolog;^ 
(Robinson 1950; Goodnian X957; Blalock 1964; Hannan 1971)» political aclence 
(Alker 1969; Shlvely 1970) » economics (Thell X954; Felge and Watta 1972) » and 
education (Bursteln 1975; Haney 1975) have all atressed the waya which In- 
ferences from grouped observations may differ systematically from those drawn 
from analysis of Individual (or» more generally^ micro) data. There is ample 
evidence that the magnitude of the grouping or aggregation bias Is likely to be 
large enough to produce very misleading findings. 

Recent methodological treatments of the grouped estimation problem have 
focused on the ways in which the nature of the grouping proceaa (the rule that 
allocates micro observations to groupa) affects the divergence between grouped 
and ungrouped estimators (cf. Hannan and Buratein 1974; Bursteln 1975). None 
of these treatments are general. Rather^ they (following Blalock 1964) consider 
a variety of aimple cases* These cases Include random grouping and grouping that 
icaximlzes between group variation in one of the varlablea in a structural equa-* 
tlona model. In each of tho caaea atudied^ grouping leada to a loaa of Infor** 
matlon and consequently to a loas of efficiency In estimation. Some typea of 
grouping processes yield estimators that contain an aggregation blaa» while 
others do not. In none of these cases la there any gain fr^m grouping. 

In an early and Important paper» Grunfeld and Grlllches (1960) proposed 

that grouping may In some cases lead to a gain. They considered the effect of 

2 

grouping on estimators of R from micro models that are Improperly specified. 
They pointed out the possibility that the grouping bias mlgtit offset the 



4 



2 

Specification bias in such a way that the R calculated from grouped data might 

2 <v-^ 

be closer to the true R than that calculated from ao incorrect micro model, 
Hannan and Burstein (197A) fittidieU this issue with reference to estimators of 
structural parameters (e.g. path regressions)* For the range of grouping cases 
they considered they found no evidence of any aggregation gain. They did 
identify cases where ^*'Ouplng tnagnifies specification bias in the micro model 

and others where the. % no magnification^but none where there:Lwas-*a-redue 

tiont While their argument appears correct as it stands* it gives a misleading 
Impression that grouping will never yield gains in terms of bias, 

Ify purpose is to reopen the issue of aggregation gain. I Will show that 
the simpler framework used by Hannan et al* (1973) to relate grouping effects 
to specification bias makes clear that aggregation gain is possible. Then 
I will explore two interesting cases* 
2. Framewo rk 

At 3 minimum we tnust consider a (true) model and two alternative estimators; 
ungrouped and grouped. We want to compare the properties of the estimators 
(here only their means) under various types of grouping processes and various 
types of model misspecif ication* For example^ consider the following model; 

(1) 

or i = + u 
vbere and are stochastic regresoors^ and 

pllm u'u) « 

piiin x'X) » i: 

plim (i X'u) - 0 
N 

(where plitn denotes the ijrobablllty Il.inlt, cf. Johnston 1972: 268-281). 



3. 



He consider the usual ungroupcd ordinary least squares estimator; 
and a grouped estimator 

I « (rx)^^ri (3) 

where the bars over vectors and matrices. Indicate that they contain grouped 
observations* Each type of grouping process determines a grouped estimator* 



He might define asymptotic grouping bias as 
pllm (£:-!)* 

But| this Is a meaningful criterion only when the ungrouped estimator is asymp- 

A 

totlcally unbiased* More generally » we must consider the possibility that J3 

is biased. To be concrete, we treat the estimator of $^ that Ignores the presence 

of In the inodel 

1 ^ 

As long as pllm 21* x) ^ 0« is an inconsistent estimator of fJ» (Theil 1937). 

The specification bias of 3| is defined as 

/v> 12 



pllm (6^ - 6j) - "Ta 



' A grouped analogue to (4) is 



6 « ^ (5) 

with 

pllm (I - Pj) - 62 ~V 

''l 

2 

where O— — and 0 — denote population covarlances and variances under the given 
grouping rule. 



A. 

Finally^ wc want to compare the grouped estimator (5) vlth the tmgrouped 
estimator (4), As we hdve constructed the example* both estimators are incon- 
sistent* The question of aggregation gain concerns the possibility that the 
asymptotic bias may be sBialler in the grouped estimator* A natural definition 
of aggregation gain is (cf* Grunfeld and Griliches; Hannan and Burstein 1974): 
\pliM (f " 0)1 < tpllm(^- 6)t (6) 
To evaluate expressions like (6) we must take explicit account of the 
nature of the grouping process* in the cases we wish to consider the grouping 
rule (ntore formally* the grouping matrix in Prais and Aitchison^s (1974) 
terminology) is stochastic * That is* the rule that places individuals in groups 
utilizes the outcome of some stochastic process (e*g* places individuals in 
groups on the basis of their value on one or more of the variables in the model)* 
This complicdtlon makes it very difficult to obtain exact expressions for (6) * 
We continue to use large sample results (probability limits) and particularly 
Monte Carlo results to evaluate (6) for the cases of interest* The relevant 
Monte Carlo results are presented more fully in Hannan et al* (1975)-and Hannan 
and Young (1976)* The most important finding is that the simulation results for 
modest sized samples (K=500 grouped by 10* s) agree clocely with the large sample 
analytic results. 

3* AfiRrefiation Gain; Omitted Variables 

The first case we investigate is that set out in the previous sectioii; 
specification bias due to the omission of a causal variable related to the 

included causal variable* The ungrouped estimator from (4) is biased and inconsistent* 

o 

A X'4 

plln, 6 . 6^ + 6 
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as Is the grouped estimator: 

0 

So aggregation gain requires that 



0 




0 




< 


0 2 









(7) 



As we reported earlier (Hannan et al. 1975) none of the commonly studied cases 
meets this criterion* Howevf^r^ from (7) it Is cl;;ar that certain types of 
grouping rules will yield an aggregation gain* For eKample^ any grouping pro- 
cess tliat elimlnatss the covarlance of and In the grouped data will yield 
such a gain* What would such a grouping process look like? One simple case Is 
a grouping that eliminates between group variation In x^* ' Then the grouped 
estimator Is unbiased ,while the imgrouped estimator is not. 

This case Is not completely arciflcialt Consider the following concrete 
example, Davis (1966) proposed that student aspirations for additional educa- 
tional attainment respond to a "frog-pond*' effect, Tlie higher the level of 
perfomance of one*s peers holding constant one*s own performance level, the 
lower are one's aspirations* Suppose that the effect operates more precisely 
as follows* Let aspirations (y) depeni linearly on performance level (x^^) anc 
rank in class (x^). Analyses that ignore x^ will give biased estimates of the 
perfonuance effect when individual data are used but not when class averages * 
are employed, 

Tliere is a general class of situations of which this one is an example. 
Rank in class is a variable dcflited relative to boihq bounded system 
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(fl "relational variable" in Lazarsfeld and Menzel's (1965) terminology), 
Vhenevax: such variables are omitted from a model and the grouping corresponds 
with the system boundaries i^o that there is no between-group variance in the 
relational variable) » groupiag will produce a gain, ^ 
4, Aggregation Gain; Measurement Error 

The flannan-Burstein and Grunfeld^KSriliches analyses presume perfect measure- 
merit of causal variables* In this section we address the possibility of aggre- 
gation gain in simple model£t in which the causal variables are measured with 
random error. In particular^ we use the following model ^ 
y = 6x + u 

X* " X + e ^ 

plim Exu) = pllm Exe) - plim Eue) =0 . ^ 

M n r. 

The ungrouped entimator of interest is 

and (cf, Johnston 1972; 282) 

plim p « K — o~ 

1 + o^/a"^ 
e x 

rhat is» the ungrouped e£;clmaLor contains an asymptotic spacif ication bias that 
depends on the ratio of measurement error variance to true score variance^ 

Next> we consider two grouping prccesoes and the reeulting grouped estimators; 
(1) grouping that maximizes grouped true score varlmce; and (2) grouping that 
maximizes observed score variancet 

A, Grouping; that maximisces Rroupetl true score varia nce 

From our previous work» we know that grouping that maximizes grouped variation 
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In X is random with reap^ct to c* In fact» under these conditions ve found that 



2 2 2 2 

ou E 0 and o« « o /b 
X X c c 



where n Is the size of each group (assuiQlng equal-slsed groups)^ Using thv-se 
results as an approximation we have 



pllm ? ■= pXim 
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1 + 0 /nJ 
e X 



Clearly with these approximations^ there Is an aggregation galu« ^or example^ 
la our simulation with the reliability of x* «7 and groups of size 10 

A 

. ^ pllm p H *54p i pUm F £ *92e 
when the reliability Is «3 

A 

pliiD p ^ .18? ; pllmF^.69P 
A& we would expect^ the lower the reliability the greater the aggregation gain. 

These sorts of considerations prompted Hald (1940) and Bartlett (1949) to 
propose certain grouped estimators as Improvements over the usual ungrouped 
estimators. They failed to realise^ however^ that the estimators they pro- 
posed are consistent Qnly when tbe observations are grouped by true scores 
(Keyman and Scott^ J.954)« I have not yet been able to Identify a realistic 

situation In educational research In which observations are grouped by true 
2 

scores. Therefore^ It Is Important to Investigate the consequences of grouping 
by observed scores (x*)* 

B. Grouping J>^f all Ibly rjeasured scores 

In realistic situations^ observations ar^ grouped by obSArved scores. 
Mere we consider the analogue to the case just discussed* namely^ grouping that 
maximizes between grotips variation In xV An additional complication arlsee 
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In this case since according to the model Is endogenous (cauBally dependent). 
As Blalock (196^0 and others have demonstrated » grouping hy values of endogenous 



variables tends to produce a (positive) correlation between regre^sors and 
disturbances even when thay are Independent In the ungroupod data (c£, Uannan 
and Yoting|1976 for Monte Carlo evidence on this). As a consequence we cannot 
presrma In this case that grouped true scorf^is (x) and grouped measurement 
errors (c) will be uncorrclated (even asymptotically) , That Is, the grouped 
estimator has tlia following asymptotic bias: 



The comparison of grouped and ungrouped estimators is more complicated than 
in the previous case. Asymptotic aggregation giiin requires that 



As far as I have been able to determlnei this condition Is not Inconsistent 
with the model specification and grouping process. Our simulation (conducted 
only on three variable models) does not yield the qiiantltles necessary to eval- 
uate the possibility of aggregation gain in small samples for this type of 

* 

grouping, 

, Blalock et al« (1970) report a Monte Carlo study that is relevant here, 
They compared the behavior of the Wald and Bartlett estimators (with data 





or 
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grouped by obser/ed scores) with the ungroupcd ordinary least squares estimator. 
These groupca estiiaatora are different from the estinmcoir just discussed but 
are roughly aaalogoiiu* At any rate, Blalock et alt found no gain over the 
ungrouped estlmatort In all cases simulated, the behavior ot the Wald and 
Bartlett estimators was quite similar on the average to the ungrouped ostiraatort 
We will shortly revise our simulation to conduct a systematic sttMy of the 
question of aggregation gain under these conditions. 

Finally, we note an interesting attcopt by Algner and Goidfeld (19?/*) to 
explicate the original Grunfeld^Griliches argument. As in most of the economics 
literature, the problem is vieued from the perspective that the micro relations 
ditreiT from individual to individual. Consequently, if there are N individuals, 
there are N structural equations to be eDtimated, We have been considering the 
simpler cnSQ where all micro units are assumed to behava according to the 
same structural relationship, Aigner and Goidfeld do treat this problem as a 
special case. In so doing, they pose a clear example of aggregation gain. 
The micro model has the form (a time series); 
y^ = BXj + u^ 

i.e,, y^ ^ 6x^ + u^ 

and are unobserved, and are replaced by Indicators measured with random error. 
In this extreme case, the random measurement errors are equal and opi.>o£ite*in sign: 
y * y^ + c 

- y2 - c. 
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10/ 

The grouped model is 

Note that the random errors cancel in the grouped data. Consequently^ the 
grouped estiioator will be consistent while ungrouped ordinary least squares 
estimators will not. 

The example is obviously artificial. Nontheless, it does give a clear 
Indication that under at least soine conditions grouping may lead to an aggregation 
gain in models that suffer from errors in variables even when one cannot group 
by true scores, 
5. Conclusions 

The various social science methodology literatures agree on the costs of 
grouping. One always loses infonuation in grouping. Moreover, in a wide 
variety of situations grouping introduces systematic error. For most educational 
research applications the existing guidelines are probably appropriate. There 
is, hoTOver, a class of situations In which grouping (of a particular type) 
will tend to compensate for errors in the original specification. That Is, 
there are certain situations In which grouping produces a gain. 

We have made the case for aggregation gain by examining two i:pecial cases. 
The first involves grouping that minimizes (grouped) variation in confounding 
variables. Obviously if grouping can eliminate such variation it may improve 
inference. We have shown that when the confounding variables are relational 
(defined relative to the group), grouping may yield a gain. The second case 
concerns the effect of grouping on measurement error. As has been widely 
recognized, grouping by true scores will yield a gain relative to estimators 
that employ ungrouped fallibly measured variables. The more realistic case 
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IX, 



In which observations arc grouped by observed scores Is more cooipllcatcdt 
However, it appears tliat aggregation gdln is possible in this case as well* 
At least we cannot on the basis of existing results rule out this possibility* 

la summary, ve argue against overgeneralizlng the results on the costs of 
aggregation, VJhether or not grouping yields costs or gains cannot be deter- 
mined without knowledge of the process that groups observations and the nature 
of the substantive problem and research design, No methodological guideline 
substitutes for careful scrutiny of each application. 
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Footnotes 



^uch of the literature on grouped estimation considers nonstochastic 
regresBorSt Since» as ve point out below» the grouping matrices ve 
consider are stochastic^ the grouped regressors are stochastic* As 
a consequence^ there Is nothing to be gained by preserving the assump* 
tion that the ungrouped regressors are fixed* The presence of sto- 
chastic regressors forces us to use weaker results than for the fixed 
case^ In particular » we examine probability limits of e&timatorst 

2 

I presume that the true scores are unknown* Otherwise a rational 
investigation would not use the ungrouped estimator considered t^re* 
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